Search CORE

INRIA a CCSD electronic archive server

webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser

Author: A Löytynoja
A Löytynoja
A Löytynoja
Ari Löytynoja
B Paten
C Dessimoz
C Kosiol
D Maddison
H McWilliam
J Felsenstein
K Wong
M Hasegawa
Nick Goldman
R Development Core Team
S Whelan
W Fletcher
W Pearson
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites. Results We have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to <it>de novo </it>alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at <url>http://tinyurl.com/webprank</url> . Conclusions The webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.</p

arXiv.org e-Print Archive

Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

Author: A Doron-Faigenboim
A Schneider
AL Halpern
AR Kinjo
C Kosiol
Darren Martin
DT Jones
G Bazykin
GC Conant
H Akaike
I Keller
J Adachi
J Adachi
JP Huelsenbeck
K Tamura
L Jin
M Anisimova
M Averof
M Hasegawa
M Kimura
MA Larkin
MO Dayhoff
MW Dimmic
N Goldman
N Rodrigue
N Takahata
NGC Smith
R Grantham
S Guindon
S Miyazawa
S Whelan
S Whelan
S Whelan
Sanzo Miyazawa
SC Choi
SQ Le
SV Muse
T Miyata
T Miyata
TK Seo
TK Seo
W Delport
W Delport
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/03/2011
Field of study

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

Public Library of Science (PLOS)

Public Library of Science (PLOS)

Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

Author: C Kosiol
C Seoighe
G Schwarz
GC Conant
Konrad Scheffler
M Anisimova
M Lacerda
N Goldman
S Whelan
Sergei Kosakovsky Pond
SL Kosakovsky Pond
SL Kosakovsky Pond
SL Kosakovsky Pond
Spencer V. Muse
SV Muse
Thomas Mailund
W Delport
Wayne Delport
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

CiteSeerX

Stellenbosch University SUNScholar Repository

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

Author: A Auton
A Kong
A Navarro
A Necşulea
A Ratnakumar
A Siepel
Adam Siepel
AJ Jeffreys
AJ Webb
AP Boyle
BC Lamb
C Kosiol
CC Spencer
CF Mugal
D Karolchik
D Kostka
Dennis Kostka
E Mancera
G Marais
Graham Coop
J Berglund
J Harrow
J Romiguier
JA Capra
JM Chen
John A. Capra
JW IJdo
K Lindblad-Toh
K Pollard
Katherine S. Pollard
L Arbiza
L Duret
L Duret
LR Meyer
M Blanchette
M Hasegawa
Melissa J. Hubisz
MJ Hubisz
N Galtier
N Galtier
N Lartillot
P Flicek
P Stenson
RD George
S Glémin
S Katzman
S Katzman
S Myers
S Myers
SE Ptak
ST Sherry
T Nagylaki
TC Brown
TR Dreszer
W Winckler
Y Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al

arXiv.org e-Print Archive

Cold Spring Harbor Laboratory Institutional Repository

eScholarship - University of California

D-Scholarship@Pitt

FigShare

A model-independent approach to infer hierarchical codon substitution dynamics

Author: A Jiménez-Sánchez
AS Novozhilov
C Kosiol
C Kosiol
CR Woese
CR Woese
DG Hwang
DS Riddle
DT Jones
E Trifonov
EN Trifonov
EN Trifonov
EN Trifonov
FH Crick
GH Gonnet
HA Simon
JEM Hornos
JG Kemeny
JR Jungck
JTF Wong
M Di Giulio
M Meilă
MA Jiménez-Montano
MA Jiménez-Montaño
Martin Nilsson Jacobi
MN Jacobi
MO Dayhoff
MS Johnson
MW Nirenberg
O Görnerup
O R
Olof Görnerup
R Marquez
S Itzkovitz
S Whelan
SD Copley
T Bollenbach
T Wilhelm
TD Wu
V Karasev
VR Chechetkin
W Taylor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Codon substitution constitutes a fundamental process in molecular biology that has been studied extensively. However, prior studies rely on various assumptions, e.g. regarding the relevance of specific biochemical properties, or on conservation criteria for defining substitution groups. Ideally, one would instead like to analyze the substitution process in terms of raw dynamics, independently of underlying system specifics. In this paper we propose a method for doing this by identifying groups of codons and amino acids such that these groups imply closed dynamics. The approach relies on recently developed spectral and agglomerative techniques for identifying hierarchical organization in dynamical systems. Results We have applied the techniques on an empirically derived Markov model of the codon substitution process that is provided in the literature. Without system specific knowledge of the substitution process, the techniques manage to "blindly" identify multiple levels of dynamics; from amino acid substitutions (via the standard genetic code) to higher order dynamics on the level of amino acid groups. We hypothesize that the acquired groups reflect earlier versions of the genetic code. Conclusions The results demonstrate the applicability of the techniques. Due to their generality, we believe that they can be used to coarse grain and identify hierarchical organization in a broad range of other biological systems and processes, such as protein interaction networks, genetic regulatory networks and food webs.</p

Chalmers Publication Library

Chalmers Research

Selective Pressure to Increase Charge in Immunodominant Epitopes of the H3 Hemagglutinin Influenza Protein

Author: AC Lowen
C Kosiol
DJ Smith
DJ Smith
ET Muñoz
Gregory J. Tobin
H Zhou
Haoxin Sun
J Adachi
J Sun
J Sun
JA Leunissen
JH McDonald
Jinxue Long
JL Thorne
K Pan
Keyao Pan
Michael W. Deem
N Goldman
N Sinha
NV Kaverin
Peter L. Nara
S Henikoff
S Nakajima
S Veerassamy
SA Frank
T Clackson
T Müller
T Müller
V Gupta
VN Minin
YY Tseng
Publication venue: Springer-Verlag
Publication date: 01/01/2010
Field of study

The evolutionary speed and the consequent immune escape of H3N2 influenza A virus make it an interesting evolutionary system. Charged amino acid residues are often significant contributors to the free energy of binding for protein–protein interactions, including antibody–antigen binding and ligand–receptor binding. We used Markov chain theory and maximum likelihood estimation to model the evolution of the number of charged amino acids on the dominant epitope in the hemagglutinin protein of circulating H3N2 virus strains. The number of charged amino acids increased in the dominant epitope B of the H3N2 virus since introduction in humans in 1968. When epitope A became dominant in 1989, the number of charged amino acids increased in epitope A and decreased in epitope B. Interestingly, the number of charged residues in the dominant epitope of the dominant circulating strain is never fewer than that in the vaccine strain. We propose these results indicate selective pressure for charged amino acids that increase the affinity of the virus epitope for water and decrease the affinity for host antibodies. The standard PAM model of generic protein evolution is unable to capture these trends. The reduced alphabet Markov model (RAMM) model we introduce captures the increased selective pressure for charged amino acids in the dominant epitope of hemagglutinin of H3N2 influenza (R2 > 0.98 between 1968 and 1988). The RAMM model calibrated to historical H3N2 influenza virus evolution in humans fit well to the H3N2/Wyoming virus evolution data from Guinea pig animal model studies

PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals

Author: A Futschik
AA Out
AC Darling
Andreas Futschik
AR Quinlan
B Charlesworth
B Haubold
C Trapnell
Carolin Kosiol
Christian Schlötterer
CJ Rubin
D Karolchik
DC Koboldt
DJ Begun
DR Zerbino
F Tajima
H Li
H Li
H Li
I Birol
J Rozowsky
JC Dohm
JF Degner
JH McDonald
Manfred Kayser
Nicola De Maio
Pablo Orozco-terWengel
Ram Vinay Pandey
Robert Kofler
RR Hudson
RR Hudson
S Hutter
S Tweedie
TB Sackton
TE Druley
TL Turner
Viola Nolte
Y Hu
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θWatterson, θπ, and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data

CiteSeerX

Online Research @ Cardiff

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

Oxford University Research Archive

Changes in Gene Expression Associated with Reproductive Maturation in Wild Female Baboons

Author: Abulafia
Adib
Alberts
Alberts
Altmann
Babbitt
Barberan-Soler
Behrensmeyer
Benjamini
Blekhman
Blurton Jones
Bogin
Brown
Buchan
Bukovsky
Courtney C. Babbitt
Gene Ontology Consortium
Gregory A. Wray
Gunz
Hahm
Harris
Haviv
Haygood
Haygood
Hoang
Jenny Tung
Jethwa
Jo
Jolly
Kosiol
Kostka
Langmead
Luderer
Luo
Mace
Matzuk
McGee
Mi
Nef
Nielsen
Paciga
Potts
R Development Core Team
Redmer
Revil
Rhine
Robinson
Silva
Small
Somel
Susan C. Alberts
Swanson
Taylor
Tomaselli
Tung
Tung
Uddin
Vallender
Wang
Wasser
Welsh
Xu
Yang
Publication venue: Oxford University Press
Publication date
Field of study

Changes in gene expression during development play an important role in shaping morphological and behavioral differences, including between humans and nonhuman primates. Although many of the most striking developmental changes occur during early development, reproductive maturation represents another critical window in primate life history. However, this process is difficult to study at the molecular level in natural primate populations. Here, we took advantage of ovarian samples made available through an unusual episode of human–wildlife conflict to identify genes that are important in this process. Specifically, we used RNA sequencing (RNA-Seq) to compare genome-wide gene expression patterns in the ovarian tissue of juvenile and adult female baboons from Amboseli National Park, Kenya. We combined this information with prior evidence of selection occurring on two primate lineages (human and chimpanzee). We found that in cases in which genes were both differentially expressed over the course of ovarian maturation and also linked to lineage-specific selection this selective signature was much more likely to occur in regulatory regions than in coding regions. These results suggest that adaptive change in the development of the primate ovary may be largely driven at the mechanistic level by selection on gene regulation, potentially in relationship to the physiology or timing of female reproductive maturation

Predicting disease-associated substitution of a single amino acid by analyzing residue interactions

Author: A del Sol
A Liaw
AM Fernandez-Escamilla
B Li
C Ferrer-Costa
C Ferrer-Costa
C Kosiol
CT Saunders
DJ Watts
G Amitai
G Bagler
H Carter
Hui Yin
J Reumers
Jiamin Xiao
JS Kaminker
JS Kaminker
KV Brinda
L Bao
L Bao
L Breman
Lezheng Yu
LH Greene
Li Yang
M Mort
M Vendruscolo
MEJ Newman
Menglong Li
NV Dokholyan
P Yue
P Yue
P Yue
PA Alexander
PC Ng
PC Ng
PD Thomas
R Karchin
RA Gibbs
RJ Dobson
S Miyazawa
S Sunyaev
SF Altschul
ST Sherry
V Ramensky
W Kabsch
W Lee
Y Bromberg
Y Bromberg
Yizhou Li
YL Yip
YL Yip
Z Wang
Zhining Wen
ZQ Ye
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues. Results We found that SAPs can be well characterized by network topological features. Mutations are probably disease-associated when they occur at a site with a high centrality value and/or high degree value in a protein structure network. We also discovered that study of the neighboring residues around a mutation site can help to determine whether the mutation is disease-related or not. We compiled a dataset from the Swiss-Prot variant pages and constructed a model to predict disease-associated SAPs based on the random forest algorithm. The values of total accuracy and MCC were 83.0% and 0.64, respectively, as determined by 5-fold cross-validation. With an independent dataset, our model achieved a total accuracy of 80.8% and MCC of 0.59, respectively. Conclusions The satisfactory performance suggests that network topological features can be used as quantification measures to determine the importance of a site on a protein, and this approach can complement existing methods for prediction of disease-associated SAPs. Moreover, the use of this method in SAP studies would help to determine the underlying linkage between SAPs and diseases through extensive investigation of mutual interactions between residues.</p